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Wednesday, October 05, 2016 


Institutional Repositories: Response to 
comments 


The introduction I wrote for the recent 
Q&A with Clifford Lynch has attracted 
some commentary from the institutional 
repository (IR) and open access (OA) 
communities. I thank those who took the 
time to respond. After reading the 
comments the following questions 
occurred to me. 


(A print version of this text is available here) 


. Is the institutional repository dead or dying? 


Judging by the Mark Twain quote with which COAR’s Kathleen Shearer 
headed her response (“The reports of our death have been greatly 
exaggerated”), and judging by CORE’s Nancy Pontika insisting in her 
comment that we should not give up on the IR (“It is my strong belief that 
we don’t need to abandon repositories”) people might conclude that I had 
said the IR is dead. 


Indeed, by the time Shearer’s comments were republished on the 
OpenAIRE blog (under the title “COAR counters reports of repositories’ 
demise”) the wording had strengthened — Shearer was now saying that I had 
made a number of “somewhat questionable assertions, in particular that 
institutional repositories (IRs) have failed.” 


That is not exactly what I said, although I did quote a blog post by Eric Van 
de Velde (here) in which he declared the IR obsolete. As he put it, “Its 
flawed foundation cannot be repaired. The IR must be phased out and 
replaced with viable alternatives.” 


What J said (and about this Clifford Lynch seemed to agree, as do a growing 
number of others) is that it is time for the research community to take stock, 
and rethink what it hopes to achieve with the IR. 


It is however correct to say I argued that green OA has “failed as a 
strategy”. And I do believe this. I gave some of the reasons why I do in my 
introduction, the most obvious of which is that green OA advocates 
assumed that once IRs were created they would quickly be filled by 
researchers self-archiving their work. Yet seventeen years after the Santa Fe 
meeting, and 22 years after Stevan Harnad began his long campaign to 
persuade researchers to self-archive, it is clear there remains little or no 
appetite for doing so, even though researchers are more than happy to post 
their papers on commercial sites like Academia.edu and ResearchGate. 


However, I then went on to say that I saw two possible future scenarios for 
the IR. The first would see the research community “finally come together, 
agree on the appropriate role and purpose of the IR, and then implement a 
strategic plan that will see repositories filled with the target content 
(whatever it is deemed to be).” 


The second scenario I envisaged was that the IR would be “captured by 
commercial publishers, much as open access itself is being captured by 
means of pay-to-publish gold OA.” 


Neither of these scenarios assumes the IR will die, although they do 
envisage somewhat different futures for it. That said, what they could share 
in common is a propensity for the link between the IR and open access to 
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weaken. Already we are seeing a growing number of papers in IRs being 
hidden behind login walls — either as a result of publisher embargoes or 
because many institutions have come to view the IR less as a way of making 
research freely available, more as a primary source of raw material for 
researcher evaluation and/or other internal processes. As IRs merge with 
Research Information Management (RIM) tools and Current Research 
Information Systems (CRIS) this darkening of the content in IRs could 
intensify. 


What makes this darkening likely is that the internal processes that IRs are 
starting to be used for generally only require the deposit of the metadata 
(bibliographic details) of papers, not the full-text. As such, the underlying 
documents may not just be inaccessible, but entirely absent. 


This outcome seems even more likely in my second scenario. Here the IR is 
(so far as research articles are concerned) downgraded to the task of linking 
users to content hosted on publishers’ sites. Again, to fulfil such a role the 
IR need host only metadata. 


So what is the role of an institutional repository? What should be 
deposited in it, and for what purpose? 


As I pointed out in my introduction, there is today no consensus on the role 
and purpose of the IR. Some see it as a platform for green OA, some view it 
as a journal publication platform, some as a metadata repository, some as a 
digital archive, some as a research data repository (I could go on). 


It is worth noting here a comment posted on my blog by David Lowe. The 
reason why the IR will persist, he said, “is not related to OA publishing as 
such, but instead to ETDs.” Presumably this means that Lowe expects the 
primary role of the IR to become that of facilitating ETD workflows. 


It turns out that ETDs are frequently locked behind login walls, as Joachim 
Schépfel and Héléne Prost pointed out in a 2014 paper called Back to Grey: 
Disclosure and Concealment of Electronic Theses and Dissertations. “Our 
paper,” they wrote “describes a new and unexpected effect of the 
development of digital libraries and open access, as a paradoxical practice 
of hiding information from the scientific community and society, while 
partly sharing it with a restricted population (campus).” 


And they concluded that the Internet “is not synonymous with openness, 
and the creation of institutional repositories and ETD workflows does not 
make all items more accessible and available. Sometimes, the new 
infrastructure even appears to increase barriers.” 


In short, the roles that IRs are expected to play are now manifold and 
sometimes they are in conflict with one another. One consequence of this is 
that the link between the repository and open access could become more 
and more tenuous. Indeed, it is not beyond the bounds of possibility that the 
link could break altogether. 


To what extent can we say that the IR movement — and the OAI-PMH 
standard on which it was based — has proved successful, both in terms 
of interoperability and deposit levels? 


As I said in my introduction, thousands of IRs have been created since 
1999. That is undoubtedly an achievement. On the other hand, many of 
these repositories remain half empty, and for the reasons stated about we 
could see them increasingly being populated with metadata alone. 


Both Shearer and Pontika agree that more could have been achieved with 
the IR. With regard to OAI-PMH Pontika says that while it has its 
disadvantages, “it has served the field well for quite some time now.” 


But what does serving the field well mean in this context? Let’s recall that 
the main reason for holding the Santa Fe meeting, and for developing OAI- 
PMH, was to make IRs interoperable. And yet interoperability remains more 
aspiration than reality today. Perhaps for this reason most research papers 
are now located by means of commercial search engines and Google 
Scholar, not OAI-PMH harvesters — a point Shearer conceded when I 
interviewed her in 2014. 


Of course, if running an IR becomes less about providing open access and 
more about enabling internal processes, or linking to papers hosted 
elsewhere, interoperability begins to seem unnecessary. 


Do IR advocates now accept that there is a need to re-think the 
institutional repository, and is the IR movement about to experience a 
great leap forward as a result? 
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Most IR advocates do appear to agree that it is time to review the current 
status of the institutional repository, and to rethink its role and purpose. And 
it is the Confederation of Open Access Repositories (COAR) that is leading 
on this. 


“The calls for a fundamental rethink of repositories is already being 
answered!” Tony Ross-Hellauer — scientific manager at OpenAIRE (a 
member of COAR) — commented on my blog. “See the ongoing work of 
the COAR next-generation repositories working group.” 


Shearer, who is the executive director of COAR (and so presumably 
responsible for the working group), explains in her response that the group 
has set itself the task of identifying “the core functionalities for the next 
generation of repositories, as well as the architectures and technologies 
required to implement them.” 


As aresult, Shearer says, the IR community is “now well positioned to offer 
a viable alternative for an open and community led scholarly 
communication system.” 


So all is well? Not everyone thinks so. As an anonymous commenter 
pointed out on my blog: “All this is not really offering a new way and more 
like reacting to the flow. Maybe that has to do with the kind of people 
working on it, the IR crowd is usually coming from the library field and 
their job is not to be inventive but to archive and keep stuff save.” 


Archiving and keeping stuff save are very worthy missions, but it is to for- 
profit publishers that people tend to turn when they are looking for 
inventive solutions, and we can see that legacy publishers are now keen to 
move into the IR space. This suggests that if the goal is to create a 
community-led scholarly communications system COAR’s initiative could 
turn out to be a case of shutting the stable door after the horse has bolted. 


What is the most important task when seeking to engineer radical 
change in scholarly communication: articulating a vision, providing 
enabling technology, or getting community buy-in? 


“Ultimately, what we are promoting is a conceptual model, not a 
technology,” says Shearer “Technologies will and must change over time, 
including repository technologies. We are calling for the scholarly 
community to take back control of the knowledge production process via a 
distributed network based at scholarly institutions around the world.” 


Shearer adds that the following vision underlies COAR’s work: 


“To position distributed repositories as the foundation of a globally 
networked infrastructure for scholarly communication that is 
collectively managed by the scholarly community. The resulting 
global repository network should have the potential to help 
transform the scholarly communication system by emphasizing the 
benefits of collective, open and distributed management, open 
content, uniform behaviors, real-time dissemination, and collective 
innovation.” 


As such, I take it that COAR is seeking to facilitate the first scenario I 
outlined. But were not the above objectives those of the attendees of the 
1999 Santa Fe meeting? Yet seventeen years later we are still waiting for 
them to be realised. Why might it be different this time around, especially 
now that legacy publishers are entering the market for IR services, and 
some universities seem minded to outsource the hosting of research papers 
to commercial organisations, rather than work with colleagues in the 
research community to create an interoperable network of distributed 
repositories? 


What has also become apparent over the past 17 years is that open 
movements and initiatives focused on radical reform of scholarly 
communication tend to be long on impassioned calls, petitions and visions, 
short on collective action. 


As NYU librarian April Hathcock put it when reporting on a Force11 
Scholarly Commons Working Group she attended recently: “As several of 
my fellow librarian colleagues pointed out at the meeting, we tend to 
participate in conversations like this all the time and always with very 
similar results. The principles are fine, but to me, they’re nothing new or 
radical. They’re the same things we’ve been talking about for ages.” 


Without doubt, articulating a vision is a good and necessary thing to do. But 
it can only take you so far. You also need enabling technology. And here we 
have learned that there is many a slip ‘twixt the cup and the lip.” OAI-PMH 
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has not delivered on its promise, as even Herbert Van de Sompel, one of the 
architects of the protocol, appears to have concluded. (Although this tweet 
suggests that he too does not agree with the way I characterised the current 
state of the IR movement). 


Shearer is of course right to say that technologies have to change over time. 
However, choosing the wrong one can at derail, or significantly slow down, 
the objective you are working towards. 


But even if you have articulated a clear and desirable vision, and you have 
put the right technology in place, in the generally chaotic and anarchic 
world of scholarly communication you can only hope to achieve your 
objectives if you get community buy-in. That is what the IR and self- 
archiving movements have surely demonstrated. 


To what extent are commercial organisations colonising the IR 
landscape? 


In my introduction I said that commercial publishers are now actively 
seeking to colonise and control the repository (a strategy supported by their 
parallel activities aimed at co-opting gold open access). As such, I said, the 
challenge the IR community faces is now much greater than in 1999. 


In her response, Shearer says that I mischaracterise the situation. “[T]here 
are numerous examples of not-for-profit aggregators including BASE, 
CORE, SemanticScholar, CiteSeerX, OpenAIRE, LA Referencia and 
SHARE (I could go on),” she said. “These services index and provide 
access to a large set of articles, while also, in some cases, keeping a copy of 
the content.” 


In fact, I did discuss non-profit services like BASE and OpenAIRE, as well 
as PubMed Central, HAL and SciELO. In doing so I pointed out that a high 
percentage of the large set of articles that Shearer refers to are not actually 
full-text documents, but metadata records. And of the full-text documents 
that are deposited, many are locked behind login walls. In the case of 
BASE, therefore, only around 60% of the records it indexes provide access 
to the full-text. 


In addition, many consist of non-peer-reviewed and non-target content such 


as blog posts. That’s fine, but this is not the target content that OA advocates 


say they want to see made open access. Indeed, in some cases a record may 
consist of no more than a link to a link (e.g. see the first item listed here). 


So the claims that these services make about indexing and providing access 
to a large set of articles need to be taken with a pinch of salt. 


It is also important to note that publishers are at a significant advantage 
here, since they host and control access to the full-text of everything they 
publish. Moreover, they can provide access to the version of record (VoR) 
of articles. This is invariably the version that researchers want to read. 


It also means that publishers can offer access both to OA papers as well as 
to paywalled papers, all through the same interface. And since they have the 
necessary funds to perfect the technology, publishers can offer more and 
better functionality, and a more user-friendly interface. For this reason, I 
suggested, they will soon (and indeed some already are) charging for 
services that index open content, as I assume Elsevier plans to do with the 
DataSearch service it is developing. This seems to me to be a new form of 
enclosure of the commons. 


Shearer also took me to task for attaching too much significance to the 
partnership between Elsevier and the University of Florida — in which the 


University has agreed to outsource access to papers indexed in its repository 


to Elsevier. I suggested that by signing up to deals like this, universities will 
allow commercial publishers to increasingly control and marginalise IRs. 
This is an exaggeration, says Shearer “[O]ne repository does not make a 
trend.” 


I agree that one swallow does not a summer make. However, summer does 
eventually arrive, and I anticipate that the agreement with the University of 
Florida will prove the first swallow of a hot summer. Other swallows will 
surely follow. 


Consider, for instance, that the University of Florida has also signed a Letter 


of Agreement with CHORUS in a pilot initiative intended to scale up the 
Elsevier project “to a multilateral, industry effort.” 


In addition to Elsevier, publishers involved in the pilot include the American 


Chemical Society, the American Physical Society, The Rockefeller 
University Press and Wiley. Other publishers will surely follow. 
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And just last week it was announced that Qatar University Library has 
signed a deal with Elsevier that apes the one signed by the University of 
Florida. I think we can see a trend in the making here. 


As things stand, therefore, it is not clear to me how initiatives like COAR 
and SHARE can hope to match the collective power of legacy publishers 
working through CHORUS. 


Let’s recall that OA advocates long argued that legacy publishers would 
never be able to replicate in an OA environment the dominance they have 
long enjoyed in the subscription world. As a result, it was said, as open 
access commodifies the services they provide publishers will experience a 
downward pressure on prices. In response, they will either have to downsize 
their operations, or get out of the publishing business altogether. Today we 
can see that legacy publishers are not only prospering in the OA 
environment, but getting ever richer as their profits rise — all at the expense 
of the taxpayer. 


But let me be clear: while I fear that legacy publishers are going to co-opt 
both OA and IRs, I would much prefer they did not. Far better that the 
research community — with the help of non-profit concerns — succeeded in 
developing COAR’s “viable alternative for an open and community led 
scholarly communication system.” 


So I applaud COAR’s initiative and absolutely sign up to its vision. My 
doubts are that, as things stand, that vision is unlikely to be realised. For it 
to happen I believe more dramatic changes would be needed than the OA 
and IR movements appear to assume, or are working towards. 


. Will the IR movement, as with all such attempts by the research 
community to take back control of scholarly communication, inevitably 
fall victim to a collective action dilemma? 


Let me here quote Van de Sompel, one of the key architects of OAI-PMH. 
Van de Sompel, I would add, has subsequently worked on OAI-ORE (which 
Lynch mentions in the Q&A) and on ResourceSync (which Shearer 
mentions in her critique). 


In a retrospective on repository interoperability efforts published last year 
Van de Sompel concluded, “Over the years, we have learned that no one is 
‘King of Scholarly Communication’ and that no progress regarding 
interoperability can be accomplished without active involvement and buy-in 
from the stakeholder communities. However, it is a significant challenge to 
determine what exactly the stakeholder communities are, and who can act as 
their representatives, when the target environment is as broad as all nodes 
involved in web-based scholarship. To put this differently, it is hard to know 
how to exactly start an effort to work towards increased interoperability.” 


The larger problem here, of course, is the difficulties inherent in trying to 
get the research community to co-operate. 


This is the problem that afflicts all attempts by the research community to, 
in Shearer’s words, “take back control of the knowledge production 
process.” What inevitably happens is that they bump up against what John 
Wenzler, Dean of Libraries California State University, has described as a 
“collective action dilemma”. 


But what is the solution? Wenzler suggests the research community should 
focus on trying to control the costs of scholarly communication. Possible 
ways of doing this he says could include requiring pricing transparency and 
lobbying for government intervention and regulation. “[T]he government 
can try to limit a natural monopoly’s ability to exploit its customers by 
regulating its prices instead.”) 


He concedes however: “Currently, the dominant political ideology in 
Western capitalist countries, especially in the United States, is hostile to 
regulation, and it would be difficult to convince politicians to impose prices 
on an industry that hasn’t been regulated in the past.” 


He adds: “Moreover, even if some kind of International Publishing 
Committee were created to establish price rates, there is a chance that 
regulators would be captured by publisher interests.” 


It is worth recalling that while OA advocates have successfully persuaded 
many governments to introduce open access/public access policies, this has 
not put control of the knowledge production process back into the hands of 
the research community, or reduced prices. Quite the reverse: it is 
(ironically) increasing the power and dominance of legacy publishers. 


In short, as things stand if you want to make a lot of money from the 
taxpayer you could do no better than become a scholarly publisher! 


I don’t like being the eternal pessimist. I am convinced there must be a way 
of achieving the objectives of the open access and IR movements, and I 
believe it would be a good thing for that to happen. Before it can, however, 
these movements really need to acknowledge the degree to which their 
objectives are being undermined and waylaid by publishers. And rather than 
just repeating the same old mantras, and recycling the same visions, they 
need to come up with new and more compelling strategies for achieving 
their objectives. I don’t claim to know what the answer is, but I do know 
that time is not on the side of the research community here. 


Posted by Richard Poynder at 10:55 Dos] 


3 comments: 


Stevan Harnad said... 


Repositories vs. Quasitories, or Much Ado About Next To Nothing: | 


“I have a feeling that when Posterity looks back at the last decade of the 
2nd A.D. millennium of scholarly and scientific research on our planet, it 
may chuckle at us.... | don't think there is any doubt in anyone's mind as 
to what the optimal and inevitable outcome of all this will be: The Give- 
Away literature will be free at last online, in one global, interlinked virtual 
library.. and its [peer review] expenses will be paid for up-front, out of the 
[subscription cancelation] savings. The only question is: When? This 
piece is written in the hope of wiping the potential smirk off Posterity's 
face by persuading the academic cavalry, now that they have been led to 
the waters of self-archiving, that they should just go ahead and drink!” 
(Harnad, 20th century) 


Richard Poynder notes that 17 years on, Institutional Repositories (IRs) 
are still half-empty of their target content: peer-reviewed research journal 
articles. 


He is right. Most researchers are still not doing the requisite keystrokes 
to deposit their peer-reviewed papers (and their frantic librarians' efforts 
are no substitute). 


The reason is that researchers’ institutions and funders still have not got 
their heads around the right deposit mandates. 


They will, but they will not get historic credit for having done it as soon as 
they could have. 


Richard also says authors are more willing to deposit in Academia.edu 
and ResearchGate. 


Not true. In percentage terms those central Quasitories are doing just as 
badly as IRs. But their visible recruiting efforts (software that keeps 
reminding and cajoling authors) is clever, and something along the same 
lines should be adopted as part of funder and especially institutional 
deposit mandates. (Keystrokes are keystrokes, whether done for one's 
own institutional repository or a third party Quasitory.) 


The biggest Quasitory of all is the Virtual Quasitory called Google 
Scholar (GS). GS has mooted most of the fuss about interoperability 
because it full-text-inverts all content. It's a nuclear weapon, but it is in 
no hurry. Unlike institutions and funders, GS is under no financial 
pressure. And unlike publishers, it does not have the ambition or the 
need to capture and preserve publishers’ obsolete, parasitic functions 
(even though, unlike publishers, GS is in an incomparably better position 
to maximise functionality on the web). GS is waiting patiently for the 
research community to get its act together. 


Institutions and funders are not just sluggish in adopting and optimizing 
their deposit mandates but they are making Faustian Little Deals with 
their parasites, prolonging their longstanding dysfunctional bondage. 


October 06, 2016 11:26 am 


Stevan Harnad said... 


Repositories vs. Quasitories, or Much Ado About Next To Nothing: 
ll 


Can't blame publishers for striving at all costs to keep making a buck, 
even if they no longer really have any essential service or expertise to 
offer (other than managing peer review). Publishers’ last resort for 
clinging to their empty empire is the OA embargo -- for which the 
antidote -- the eprint-request button (the IR's functional equivalent of 
Academia.edu and ResearchGate -- is already known; it's just waiting to 
be used, along with effective deposit mandates. 


As to why it's all taking so excruciatingly long: I'm no good at sussing 
that out, and besides, Alma Swan has forbidden me even to give voice to 
my suspicion, beyond perhaps the first of its nine letters: S. 
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Petr Knoth said... 


| just wanted to add that CORE, listed among the OAI-PMH harvesters, 
is a free not-for-profit service, which indexes and keeps a cached copy of 
research papers harvested from repositories and journals via OAI-PMH 
(and other protocols). It carries out no pulling of full-texts at the time of 
access, unless specifically requested by the user, and the user is also 
not going to hit a pay-walls. CORE has close to 4.5 million full-texts 
available and about 37 million metadata records. The access to the full- 
text content is, as opposed to other commercial services, provided both 
via a user interface as well as an API or data dumps. 


My argument here is that | don't think that it can be said that OAI-PMH 
has failed and does not enable interoperability or the development of 
aggregations. Clearly this is possible. However, | have highlighted before 
certain issues with OAI-PMH that make interoperability more difficult to 
achieve, see (Knoth, 2013). | believe the way forward is to constructively 
address these issues through the development of common practice or 
better open protocols. This approach is completely different from the 
strategy of most existing commercial services that create solutions on 
top of which it is almost impossible to develop anything new. For 
example, in the domain of text mining research papers, there is no 
commercial service providing an acceptable solution to the provision of 
research papers. The role of repositories in enabling this (and other) 
important use cases should not be underestimated. In fact, achieving 
interoperability across the content from publishers is an order of 
magnitude more complicated than across repositories, see (Knoth & 
Pontika, 2016). 
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